The purpose of this document is to illustrate time series analysis and forecasting. We will use a simulated dataset to analyze things like visits, discharges and payments. To perform these analyses we will be following the modeltime workflow. This report will be broken down into sections that follow that same workflow.
Lets take a look at our data and see what it has.
df_tbl %>%
glimpse()## Rows: 25,279
## Columns: 12
## $ mrn <chr> "66727914", "84487881", "68427598", "39652414"~
## $ visit_id <chr> "1283065398", "1171004549", "1951203647", "149~
## $ visit_start_date_time <dttm> 2011-12-26 01:14:00, 2011-12-31 05:00:00, 201~
## $ visit_end_date_time <dttm> 2012-01-01 12:06:00, 2012-01-01 12:51:00, 201~
## $ total_charge_amount <dbl> 62580.61, 38466.48, 31758.50, 14699.61, 66096.~
## $ total_adjustment_amount <dbl> -39117.58, -26930.67, -23706.23, -10841.80, -7~
## $ total_payment_amount <dbl> -23463.03, -11535.81, -8052.27, -3857.81, -587~
## $ payer_grouping <chr> "Commercial", "Blue Cross", "Blue Cross", "Blu~
## $ service_line <chr> "Medical", "Surgical", "Medical", "Chest Pain"~
## $ ip_op_flag <chr> "I", "I", "I", "I", "I", "O", "I", "I", "O", "~
## $ adm_date <date> 2011-12-26, 2011-12-31, 2011-12-28, 2011-12-3~
## $ dsch_date <date> 2012-01-01, 2012-01-01, 2012-01-01, 2012-01-0~
skim(df_tbl)| Name | df_tbl |
| Number of rows | 25279 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| Date | 2 |
| numeric | 3 |
| POSIXct | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| mrn | 0 | 1 | 8 | 8 | 0 | 16789 | 0 |
| visit_id | 0 | 1 | 10 | 10 | 0 | 25279 | 0 |
| payer_grouping | 0 | 1 | 10 | 10 | 0 | 2 | 0 |
| service_line | 0 | 1 | 2 | 44 | 0 | 27 | 0 |
| ip_op_flag | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| adm_date | 0 | 1 | 2011-12-19 | 2019-12-31 | 2015-05-27 | 2916 |
| dsch_date | 0 | 1 | 2012-01-01 | 2019-12-31 | 2015-05-29 | 2887 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| total_charge_amount | 0 | 1.00 | 34260.35 | 48285.83 | 0.5 | 10847.83 | 19475.11 | 39463.87 | 1109001.99 | ▇▁▁▁▁ |
| total_adjustment_amount | 0 | 1.00 | -22550.46 | 36053.24 | -914728.0 | -25220.65 | -11619.57 | -6951.65 | 63627.62 | ▁▁▁▁▇ |
| total_payment_amount | 586 | 0.98 | -11584.37 | 18165.44 | -495119.8 | -13132.33 | -5999.98 | -2920.08 | 436.01 | ▁▁▁▁▇ |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| visit_start_date_time | 0 | 1 | 2011-12-19 21:33:00 | 2019-12-31 05:00:00 | 2015-05-27 01:55:00 | 16731 |
| visit_end_date_time | 0 | 1 | 2012-01-01 12:06:00 | 2019-12-31 22:58:00 | 2015-05-29 00:00:00 | 15184 |
Our objectives are to:
Our forecasting will focus on a grouped forecast where we are going to forecast the number of discharges by inpatient/outpatient visit type and by payer grouping.
We are going to do this on a weekly scale.
df_tblsummarise_by_time() with .by = "week", and n() the visits.transactions_weekly_tbltransactions_weekly_tbl <- df_tbl %>%
filter(payer_grouping != "?") %>%
group_by(ip_op_flag, payer_grouping) %>%
summarise_by_time(
.date_var = dsch_date
, .by = "week"
, value = n()
)
transactions_weekly_tbl## # A tibble: 1,667 x 4
## # Groups: ip_op_flag, payer_grouping [4]
## ip_op_flag payer_grouping dsch_date value
## <chr> <chr> <date> <int>
## 1 I Blue Cross 2012-01-01 37
## 2 I Blue Cross 2012-01-08 31
## 3 I Blue Cross 2012-01-15 36
## 4 I Blue Cross 2012-01-22 32
## 5 I Blue Cross 2012-01-29 40
## 6 I Blue Cross 2012-02-05 38
## 7 I Blue Cross 2012-02-12 31
## 8 I Blue Cross 2012-02-19 37
## 9 I Blue Cross 2012-02-26 38
## 10 I Blue Cross 2012-03-04 31
## # ... with 1,657 more rows
Use plot_time_series() to visualize the discharges.
log() transformation to see the effect on the time seriestransactions_weekly_tbl %>%
plot_time_series(
.date_var = dsch_date
, .color_var = payer_grouping
, .facet_vars = payer_grouping
, .facet_ncol = 2
, .value = log(value)
, .smooth = FALSE
)Visualize the ACF using plot_acf_diagnostics() using a log() transformation. Look for:
transactions_weekly_tbl %>%
ungroup() %>%
plot_acf_diagnostics(dsch_date, log(value))transactions_weekly_tbllog()standardize_vec()transactions_trans_weekly_tbltransactions_trans_weekly_tbl <- transactions_weekly_tbl %>%
mutate(value = log(value)) %>%
mutate(value = standardize_vec(value))## Standardization Parameters
## mean: 3.08875144281386
## standard deviation: 0.367674566335952
## Standardization Parameters
## mean: 1.83577890003612
## standard deviation: 0.545791389303644
## Standardization Parameters
## mean: 3.15330156564258
## standard deviation: 0.302421031976675
## Standardization Parameters
## mean: 1.59951348649452
## standard deviation: 0.514947645076106
mean_a <- 3.08875144281386
sd_a <- 0.367674566335952
mean_b <- 1.83577890003612
sd_b <- 0.545791389303644
mean_c <- 3.15330156564258
sd_c <- 0.302421031976675
mean_d <- 1.59951348649452
sd_d <- 0.514947645076106Visualize the log-standardized transactions using plot_time_series(). This confirms the transformation was performed successfully.
transactions_trans_weekly_tbl %>%
plot_time_series(
.date_var = dsch_date
, .color_var = payer_grouping
, .facet_vars = payer_grouping
, .facet_ncol = 2
, .value = value
, .smooth = FALSE
)We’ll use these parameters to create our “full dataset”. We’ve selected an 14-week forecast horizon. Our lag period is 14 weeks and we’ll try out a few rolling averages at various aggregations.
horizon <- 14
lag_period <- 14
rolling_periods <- c(7, 14, 28, 52)transactions_weekly_tblbind_rows() and future_frame() to extend the data frame .length_out = horizon.tk_augment_lags() to add a .lags = lag_periodtk_agument_slidify() to add .period = rolling_periods. Use mean as the rolling function. Make sure to “center” with “partial” windows.full_tbl.full_tbl <- transactions_trans_weekly_tbl %>%
# Add future window
bind_rows(
future_frame(
.data = .
, .date_var = dsch_date
, .length_out = horizon
)
) %>%
# Add autocorrelated lags
tk_augment_lags(value, .lags = lag_period) %>%
# Add rolling features
tk_augment_slidify(
.value = value_lag14,
.f = mean,
.period = rolling_periods,
.align = "center",
.partial = TRUE
) %>%
# Rename columns
rename_with(
.cols = contains("lag")
, .fn = ~ str_c("lag_", .)
) %>%
select(dsch_date, everything())
full_tbl %>%
glimpse()## Rows: 1,723
## Columns: 9
## Groups: lag_ip_op_flag, payer_grouping [4]
## $ dsch_date <date> 2012-01-01, 2012-01-08, 2012-01-15, 2012-01-2~
## $ lag_ip_op_flag <chr> "I", "I", "I", "I", "I", "I", "I", "I", "I", "~
## $ payer_grouping <chr> "Blue Cross", "Blue Cross", "Blue Cross", "Blu~
## $ value <dbl> 1.4201865, 0.9389710, 1.3456669, 1.0253210, 1.~
## $ lag_value_lag14 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ lag_value_lag14_roll_7 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ lag_value_lag14_roll_14 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ lag_value_lag14_roll_28 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ lag_value_lag14_roll_52 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
Visualize the features, and review what you see.
full_tblpivot_longer every column except “dsch_date”plot_time_series() to visualize the time series coloring by “name”.Review the visualization selecting one feature at a time and answering the following questions:
- Do the rolling lags present any issues?
- Which rolling lag captures the trend the best?
- Do you expect either of the Product Events features to help?
full_tbl %>%
pivot_longer(cols = -c(dsch_date, lag_ip_op_flag, payer_grouping)) %>%
plot_time_series(
dsch_date
, value
, name
, .smooth = FALSE
, .facet_ncol = 2
)Create a data_prepared_tbl by filtering full_tbl where “value” is non-missing.
data_prepared_tbl <- full_tbl %>%
filter(!is.na(value))
data_prepared_tbl## # A tibble: 1,667 x 9
## # Groups: lag_ip_op_flag, payer_grouping [4]
## dsch_date lag_ip_op_flag payer_grouping value lag_value_lag14
## <date> <chr> <chr> <dbl> <dbl>
## 1 2012-01-01 I Blue Cross 1.42 NA
## 2 2012-01-08 I Blue Cross 0.939 NA
## 3 2012-01-15 I Blue Cross 1.35 NA
## 4 2012-01-22 I Blue Cross 1.03 NA
## 5 2012-01-29 I Blue Cross 1.63 NA
## 6 2012-02-05 I Blue Cross 1.49 NA
## 7 2012-02-12 I Blue Cross 0.939 NA
## 8 2012-02-19 I Blue Cross 1.42 NA
## 9 2012-02-26 I Blue Cross 1.49 NA
## 10 2012-03-04 I Blue Cross 0.939 NA
## # ... with 1,657 more rows, and 4 more variables: lag_value_lag14_roll_7 <dbl>,
## # lag_value_lag14_roll_14 <dbl>, lag_value_lag14_roll_28 <dbl>,
## # lag_value_lag14_roll_52 <dbl>
Create a forecast_tbl by filtering full_tbl where “value” is missing.
forecast_tbl <- full_tbl %>%
filter(is.na(value))
forecast_tbl## # A tibble: 56 x 9
## # Groups: lag_ip_op_flag, payer_grouping [4]
## dsch_date lag_ip_op_flag payer_grouping value lag_value_lag14
## <date> <chr> <chr> <dbl> <dbl>
## 1 2020-01-05 I Blue Cross NA -1.22
## 2 2020-01-12 I Blue Cross NA -0.393
## 3 2020-01-19 I Blue Cross NA -0.860
## 4 2020-01-26 I Blue Cross NA -0.695
## 5 2020-02-02 I Blue Cross NA -1.88
## 6 2020-02-09 I Blue Cross NA -1.42
## 7 2020-02-16 I Blue Cross NA -1.42
## 8 2020-02-23 I Blue Cross NA -2.14
## 9 2020-03-01 I Blue Cross NA -0.860
## 10 2020-03-08 I Blue Cross NA -1.42
## # ... with 46 more rows, and 4 more variables: lag_value_lag14_roll_7 <dbl>,
## # lag_value_lag14_roll_14 <dbl>, lag_value_lag14_roll_28 <dbl>,
## # lag_value_lag14_roll_52 <dbl>
data_prepared_tbltime_series_split() to create a single time series split.
assess = horizon to get the last 14-weeks of data as testing data.cumulative = TRUE to use all of the previous data as training data.splitssplits <- data_prepared_tbl %>%
time_series_split(assess = horizon, cumulative = TRUE)Make a preprocessing recipe using recipe(). Note - It may help to prep() and juice() your recipe to see the effect of your transformations.
recipe() using “value ~ .” and data = training(splits)step_timeseries_signature() using the date featurestep_normalize().step_dummy(). Set one_hot = TRUE.recipe_spec_base <- recipe(
value ~ .
, data = training(splits) %>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date)
) %>%
# Time Series Signature
step_timeseries_signature(dsch_date) %>%
step_rm(matches("(iso)|(xts)|(hour)|(minute)|(second)|(am.pm)")) %>%
# Standardization
step_normalize(matches("(index.num)|(year)|(yday)")) %>%
# Dummy Encoding (One Hot Encoding)
step_dummy(all_nominal(), one_hot = TRUE) %>%
# Fourier - 7 Week ACF
step_fourier(dsch_date, period = c(7, 14, 52), K = 2)
recipe_spec_base %>%
prep() %>%
juice() %>%
glimpse()## Rows: 1,612
## Columns: 58
## $ dsch_date <date> 2012-01-01, 2012-01-08, 2012-01-15, 2012-01~
## $ lag_value_lag14 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ lag_value_lag14_roll_7 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ lag_value_lag14_roll_14 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ lag_value_lag14_roll_28 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ lag_value_lag14_roll_52 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ value <dbl> 1.4201865, 0.9389710, 1.3456669, 1.0253210, ~
## $ dsch_date_index.num <dbl> -1.725001, -1.716430, -1.707859, -1.699288, ~
## $ dsch_date_year <dbl> -1.505993, -1.505993, -1.505993, -1.505993, ~
## $ dsch_date_half <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_quarter <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,~
## $ dsch_date_month <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4,~
## $ dsch_date_day <int> 1, 8, 15, 22, 29, 5, 12, 19, 26, 4, 11, 18, ~
## $ dsch_date_wday <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_mday <int> 1, 8, 15, 22, 29, 5, 12, 19, 26, 4, 11, 18, ~
## $ dsch_date_qday <int> 1, 8, 15, 22, 29, 36, 43, 50, 57, 64, 71, 78~
## $ dsch_date_yday <dbl> -1.70440843, -1.63725147, -1.57009451, -1.50~
## $ dsch_date_mweek <int> 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1,~
## $ dsch_date_week <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1~
## $ dsch_date_week2 <int> 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,~
## $ dsch_date_week3 <int> 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0,~
## $ dsch_date_week4 <int> 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3,~
## $ dsch_date_mday7 <int> 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,~
## $ lag_ip_op_flag_I <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ lag_ip_op_flag_O <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ payer_grouping_Blue.Cross <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ payer_grouping_Commercial <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_01 <dbl> 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_02 <dbl> 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_03 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0,~
## $ dsch_date_month.lbl_04 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,~
## $ dsch_date_month.lbl_05 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_06 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_07 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_08 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_09 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_11 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_wday.lbl_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_6 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_7 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_sin7_K1 <dbl> 0.37526700, 0.95866785, 0.82017225, 0.064070~
## $ dsch_date_cos7_K1 <dbl> 0.9269168, 0.2845276, -0.5721167, -0.9979454~
## $ dsch_date_sin7_K2 <dbl> 0.6956826, 0.5455349, -0.9384684, -0.1278772~
## $ dsch_date_cos7_K2 <dbl> 0.71834935, -0.83808810, -0.34536505, 0.9917~
## $ dsch_date_sin14_K1 <dbl> -0.1911586, -0.5981105, -0.8865993, -0.99948~
## $ dsch_date_cos14_K1 <dbl> -0.98155916, -0.80141362, -0.46253829, -0.03~
## $ dsch_date_sin14_K2 <dbl> 0.37526700, 0.95866785, 0.82017225, 0.064070~
## $ dsch_date_cos14_K2 <dbl> 0.9269168, 0.2845276, -0.5721167, -0.9979454~
## $ dsch_date_sin52_K1 <dbl> 0.78183148, 0.85128444, 0.90832376, 0.952117~
## $ dsch_date_cos52_K1 <dbl> 0.62348980, 0.52470449, 0.41826780, 0.305731~
## $ dsch_date_sin52_K2 <dbl> 0.9749279, 0.8933455, 0.7598452, 0.5821853, ~
## $ dsch_date_cos52_K2 <dbl> -0.22252093, -0.44937040, -0.65010409, -0.81~
Use plot_time_series_regression to test out several natural splines:
splines::ns() with degrees of freedom 1, 2, 3, and 4.Which value of df would you select?
data_prepared_tbl %>%
plot_time_series_regression(
.date_var = dsch_date,
.formula = value ~ splines::ns(dsch_date, df = 3),
.show_summary = FALSE,
.facet_ncol = 2
)Create a model specification for linear regression:
linear_reg() functionset_engine("lm")model_spec_lmmodel_spec_lm <- linear_reg() %>%
set_engine("lm")Create a recipe for the spline model.
recipe_spec_basedeg_free = 3recipe_spec_1_splinerecipe_spec_1_spline <- recipe_spec_base %>%
step_rm(dsch_date) %>%
step_ns(ends_with("index.num"), deg_free = 3) %>%
step_rm(starts_with("lag_"))
recipe_spec_1_spline %>%
prep() %>%
juice() %>%
glimpse()## Rows: 1,612
## Columns: 52
## $ value <dbl> 1.4201865, 0.9389710, 1.3456669, 1.0253210, ~
## $ dsch_date_year <dbl> -1.505993, -1.505993, -1.505993, -1.505993, ~
## $ dsch_date_half <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_quarter <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,~
## $ dsch_date_month <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4,~
## $ dsch_date_day <int> 1, 8, 15, 22, 29, 5, 12, 19, 26, 4, 11, 18, ~
## $ dsch_date_wday <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_mday <int> 1, 8, 15, 22, 29, 5, 12, 19, 26, 4, 11, 18, ~
## $ dsch_date_qday <int> 1, 8, 15, 22, 29, 36, 43, 50, 57, 64, 71, 78~
## $ dsch_date_yday <dbl> -1.70440843, -1.63725147, -1.57009451, -1.50~
## $ dsch_date_mweek <int> 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1,~
## $ dsch_date_week <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1~
## $ dsch_date_week2 <int> 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,~
## $ dsch_date_week3 <int> 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0,~
## $ dsch_date_week4 <int> 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3,~
## $ dsch_date_mday7 <int> 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,~
## $ payer_grouping_Blue.Cross <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ payer_grouping_Commercial <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_01 <dbl> 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_02 <dbl> 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_03 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0,~
## $ dsch_date_month.lbl_04 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,~
## $ dsch_date_month.lbl_05 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_06 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_07 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_08 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_09 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_11 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_wday.lbl_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_6 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_7 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_sin7_K1 <dbl> 0.37526700, 0.95866785, 0.82017225, 0.064070~
## $ dsch_date_cos7_K1 <dbl> 0.9269168, 0.2845276, -0.5721167, -0.9979454~
## $ dsch_date_sin7_K2 <dbl> 0.6956826, 0.5455349, -0.9384684, -0.1278772~
## $ dsch_date_cos7_K2 <dbl> 0.71834935, -0.83808810, -0.34536505, 0.9917~
## $ dsch_date_sin14_K1 <dbl> -0.1911586, -0.5981105, -0.8865993, -0.99948~
## $ dsch_date_cos14_K1 <dbl> -0.98155916, -0.80141362, -0.46253829, -0.03~
## $ dsch_date_sin14_K2 <dbl> 0.37526700, 0.95866785, 0.82017225, 0.064070~
## $ dsch_date_cos14_K2 <dbl> 0.9269168, 0.2845276, -0.5721167, -0.9979454~
## $ dsch_date_sin52_K1 <dbl> 0.78183148, 0.85128444, 0.90832376, 0.952117~
## $ dsch_date_cos52_K1 <dbl> 0.62348980, 0.52470449, 0.41826780, 0.305731~
## $ dsch_date_sin52_K2 <dbl> 0.9749279, 0.8933455, 0.7598452, 0.5821853, ~
## $ dsch_date_cos52_K2 <dbl> -0.22252093, -0.44937040, -0.65010409, -0.81~
## $ dsch_date_index.num_ns_1 <dbl> 0.000000000, -0.001897055, -0.003793515, -0.~
## $ dsch_date_index.num_ns_2 <dbl> 0.00000000, 0.00567732, 0.01135410, 0.017029~
## $ dsch_date_index.num_ns_3 <dbl> 0.000000000, -0.003780196, -0.007560030, -0.~
Create a workflow for the linear regression and preprocessing recipe:
workflow()add_model() to add the model_spec_lmadd_recipe() to add the recipe_spec_1_splineworkflow_fit_lm_1_splineworkflow_fit_lm_1_spline <- workflow() %>%
add_model(model_spec_lm) %>%
add_recipe(recipe_spec_1_spline) %>%
fit(
training(splits) %>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date)
)
workflow_fit_lm_1_spline %>%
pull_workflow_fit() %>%
pluck("fit") %>%
summary()##
## Call:
## stats::lm(formula = ..y ~ ., data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.14714 -0.47023 0.09601 0.61494 2.20079
##
## Coefficients: (16 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.391e+02 1.572e+02 0.885 0.376363
## dsch_date_year 5.983e+01 9.928e+01 0.603 0.546841
## dsch_date_half -3.409e-01 6.905e-01 -0.494 0.621550
## dsch_date_quarter -3.162e+01 1.726e+01 -1.832 0.067152 .
## dsch_date_month 9.036e+00 6.051e+00 1.493 0.135548
## dsch_date_day 3.147e-01 2.035e-01 1.547 0.122166
## dsch_date_wday NA NA NA NA
## dsch_date_mday NA NA NA NA
## dsch_date_qday -3.409e-01 1.894e-01 -1.799 0.072140 .
## dsch_date_yday 1.425e+01 9.278e+00 1.536 0.124722
## dsch_date_mweek -9.362e-02 2.597e-02 -3.606 0.000321 ***
## dsch_date_week -1.073e-01 1.098e-01 -0.977 0.328623
## dsch_date_week2 -2.556e-02 4.984e-02 -0.513 0.608080
## dsch_date_week3 -4.175e-02 2.726e-02 -1.531 0.125861
## dsch_date_week4 -1.848e-02 2.376e-02 -0.778 0.436946
## dsch_date_mday7 -8.973e-02 8.401e-02 -1.068 0.285659
## payer_grouping_Blue.Cross 3.459e-02 4.331e-02 0.799 0.424646
## payer_grouping_Commercial NA NA NA NA
## dsch_date_month.lbl_01 -2.824e-02 1.048e+00 -0.027 0.978515
## dsch_date_month.lbl_02 -1.158e-01 8.339e-01 -0.139 0.889570
## dsch_date_month.lbl_03 -1.352e+00 8.120e-01 -1.665 0.096166 .
## dsch_date_month.lbl_04 -3.725e-01 5.295e-01 -0.704 0.481838
## dsch_date_month.lbl_05 -6.017e-01 3.283e-01 -1.833 0.067058 .
## dsch_date_month.lbl_06 NA NA NA NA
## dsch_date_month.lbl_07 7.636e-01 6.657e-01 1.147 0.251486
## dsch_date_month.lbl_08 5.745e-01 3.505e-01 1.639 0.101378
## dsch_date_month.lbl_09 NA NA NA NA
## dsch_date_month.lbl_10 -1.550e-01 3.018e-01 -0.513 0.607676
## dsch_date_month.lbl_11 NA NA NA NA
## dsch_date_month.lbl_12 NA NA NA NA
## dsch_date_wday.lbl_1 NA NA NA NA
## dsch_date_wday.lbl_2 NA NA NA NA
## dsch_date_wday.lbl_3 NA NA NA NA
## dsch_date_wday.lbl_4 NA NA NA NA
## dsch_date_wday.lbl_5 NA NA NA NA
## dsch_date_wday.lbl_6 NA NA NA NA
## dsch_date_wday.lbl_7 NA NA NA NA
## dsch_date_sin7_K1 -3.802e-04 3.071e-02 -0.012 0.990124
## dsch_date_cos7_K1 -5.333e-03 3.093e-02 -0.172 0.863125
## dsch_date_sin7_K2 -2.066e-02 3.072e-02 -0.673 0.501335
## dsch_date_cos7_K2 1.856e-02 3.076e-02 0.603 0.546463
## dsch_date_sin14_K1 -2.950e-04 3.139e-02 -0.009 0.992504
## dsch_date_cos14_K1 -2.498e-02 3.109e-02 -0.803 0.421892
## dsch_date_sin14_K2 NA NA NA NA
## dsch_date_cos14_K2 NA NA NA NA
## dsch_date_sin52_K1 7.225e-02 2.104e-01 0.343 0.731305
## dsch_date_cos52_K1 5.115e-02 1.983e-01 0.258 0.796447
## dsch_date_sin52_K2 -2.983e-01 1.009e-01 -2.956 0.003160 **
## dsch_date_cos52_K2 2.788e-01 9.845e-02 2.831 0.004695 **
## dsch_date_index.num_ns_1 -1.194e+02 1.960e+02 -0.609 0.542524
## dsch_date_index.num_ns_2 -2.437e+02 4.009e+02 -0.608 0.543374
## dsch_date_index.num_ns_3 -1.686e+02 2.786e+02 -0.605 0.545265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8695 on 1576 degrees of freedom
## Multiple R-squared: 0.227, Adjusted R-squared: 0.2099
## F-statistic: 13.23 on 35 and 1576 DF, p-value: < 2.2e-16
Create a recipe for the spline model.
recipe_spec_baserecipe_spec_2_lagrecipe_spec_2_lag <- recipe_spec_base %>%
step_rm(dsch_date) %>%
step_naomit(starts_with("lag_"))
recipe_spec_2_lag %>%
prep() %>%
juice() %>%
glimpse()## Rows: 1,456
## Columns: 57
## $ lag_value_lag14 <dbl> 0.5632302, 1.0253210, 1.9525720, 1.7649254, ~
## $ lag_value_lag14_roll_7 <dbl> 1.3486589, 1.4378583, 1.4847274, 1.5128270, ~
## $ lag_value_lag14_roll_14 <dbl> 1.6262373, 1.5761485, 1.5381210, 1.4190124, ~
## $ lag_value_lag14_roll_28 <dbl> 1.5171480, 1.5142482, 1.4748632, 1.4469190, ~
## $ lag_value_lag14_roll_52 <dbl> 1.308105, 1.297136, 1.297136, 1.295662, 1.28~
## $ value <dbl> 0.9389710, 1.1090137, 0.4605842, 0.2428843, ~
## $ dsch_date_index.num <dbl> -1.390734, -1.382163, -1.373592, -1.365021, ~
## $ dsch_date_year <dbl> -1.505993, -1.505993, -1.505993, -1.505993, ~
## $ dsch_date_half <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1,~
## $ dsch_date_quarter <int> 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1,~
## $ dsch_date_month <int> 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 1~
## $ dsch_date_day <int> 30, 7, 14, 21, 28, 4, 11, 18, 25, 2, 9, 16, ~
## $ dsch_date_wday <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_mday <int> 30, 7, 14, 21, 28, 4, 11, 18, 25, 2, 9, 16, ~
## $ dsch_date_qday <int> 92, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 7~
## $ dsch_date_yday <dbl> 0.9147130, 0.9818700, 1.0490270, 1.1161839, ~
## $ dsch_date_mweek <int> 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1,~
## $ dsch_date_week <int> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, ~
## $ dsch_date_week2 <int> 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1,~
## $ dsch_date_week3 <int> 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 1,~
## $ dsch_date_week4 <int> 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 1,~
## $ dsch_date_mday7 <int> 5, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1,~
## $ lag_ip_op_flag_I <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ lag_ip_op_flag_O <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ payer_grouping_Blue.Cross <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ payer_grouping_Commercial <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_01 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,~
## $ dsch_date_month.lbl_02 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_03 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_04 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_05 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_06 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_07 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_08 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_09 <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_10 <dbl> 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_11 <dbl> 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_month.lbl_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,~
## $ dsch_date_wday.lbl_1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ dsch_date_wday.lbl_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_6 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_wday.lbl_7 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ dsch_date_sin7_K1 <dbl> -0.74027800, -0.98718178, -0.49071755, 0.375~
## $ dsch_date_cos7_K1 <dbl> -0.6723009, 0.1595999, 0.8713187, 0.9269168,~
## $ dsch_date_sin7_K2 <dbl> 0.9953791, -0.3151082, -0.8551428, 0.6956826~
## $ dsch_date_cos7_K2 <dbl> -0.09602303, -0.94905575, 0.51839257, 0.7183~
## $ dsch_date_sin14_K1 <dbl> 0.9144126, 0.6482284, 0.2536546, -0.1911586,~
## $ dsch_date_cos14_K1 <dbl> -0.40478334, -0.76144596, -0.96729486, -0.98~
## $ dsch_date_sin14_K2 <dbl> -0.74027800, -0.98718178, -0.49071755, 0.375~
## $ dsch_date_cos14_K2 <dbl> -0.6723009, 0.1595999, 0.8713187, 0.9269168,~
## $ dsch_date_sin52_K1 <dbl> -0.62348980, -0.52470449, -0.41826780, -0.30~
## $ dsch_date_cos52_K1 <dbl> 0.78183148, 0.85128444, 0.90832376, 0.952117~
## $ dsch_date_sin52_K2 <dbl> -0.9749279, -0.8933455, -0.7598452, -0.58218~
## $ dsch_date_cos52_K2 <dbl> 0.22252093, 0.44937040, 0.65010409, 0.813056~
Save the workflow as workflow_fit_lm_2_lag.
workflow_fit_lm_2_lag <- workflow() %>%
add_model(model_spec_lm) %>%
add_recipe(recipe_spec_2_lag) %>%
fit(
training(splits) %>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date
)
)
workflow_fit_lm_2_lag %>%
pull_workflow_fit() %>%
pluck("fit") %>%
summary()##
## Call:
## stats::lm(formula = ..y ~ ., data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3032 -0.4351 0.0837 0.5198 2.2314
##
## Coefficients: (17 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.330279 27.355992 1.474 0.14063
## lag_value_lag14 0.014434 0.027543 0.524 0.60033
## lag_value_lag14_roll_7 0.020367 0.102685 0.198 0.84281
## lag_value_lag14_roll_14 -1.249247 0.173674 -7.193 1.02e-12 ***
## lag_value_lag14_roll_28 2.118667 0.249992 8.475 < 2e-16 ***
## lag_value_lag14_roll_52 0.106873 0.196070 0.545 0.58579
## dsch_date_index.num 27.014865 93.172930 0.290 0.77190
## dsch_date_year -26.981331 93.230247 -0.289 0.77231
## dsch_date_half 0.289908 0.719994 0.403 0.68726
## dsch_date_quarter -45.201588 20.849841 -2.168 0.03033 *
## dsch_date_month 13.621776 6.880147 1.980 0.04791 *
## dsch_date_day 0.468731 0.231216 2.027 0.04282 *
## dsch_date_wday NA NA NA NA
## dsch_date_mday NA NA NA NA
## dsch_date_qday -0.488245 0.228783 -2.134 0.03301 *
## dsch_date_yday 1.201618 8.843230 0.136 0.89194
## dsch_date_mweek -0.081427 0.026315 -3.094 0.00201 **
## dsch_date_week -0.012900 0.085225 -0.151 0.87971
## dsch_date_week2 -0.017711 0.048256 -0.367 0.71366
## dsch_date_week3 -0.046949 0.026290 -1.786 0.07435 .
## dsch_date_week4 -0.021719 0.022890 -0.949 0.34287
## dsch_date_mday7 -0.111697 0.079606 -1.403 0.16080
## lag_ip_op_flag_I -0.062301 0.041689 -1.494 0.13529
## lag_ip_op_flag_O NA NA NA NA
## payer_grouping_Blue.Cross -0.028254 0.041756 -0.677 0.49874
## payer_grouping_Commercial NA NA NA NA
## dsch_date_month.lbl_01 -0.455870 1.019805 -0.447 0.65493
## dsch_date_month.lbl_02 -0.369707 0.803986 -0.460 0.64570
## dsch_date_month.lbl_03 -1.877161 0.871973 -2.153 0.03150 *
## dsch_date_month.lbl_04 -0.443654 0.519075 -0.855 0.39286
## dsch_date_month.lbl_05 -0.770782 0.339497 -2.270 0.02333 *
## dsch_date_month.lbl_06 NA NA NA NA
## dsch_date_month.lbl_07 0.222867 0.681106 0.327 0.74356
## dsch_date_month.lbl_08 0.282726 0.356606 0.793 0.42801
## dsch_date_month.lbl_09 NA NA NA NA
## dsch_date_month.lbl_10 -0.248225 0.306513 -0.810 0.41817
## dsch_date_month.lbl_11 NA NA NA NA
## dsch_date_month.lbl_12 NA NA NA NA
## dsch_date_wday.lbl_1 NA NA NA NA
## dsch_date_wday.lbl_2 NA NA NA NA
## dsch_date_wday.lbl_3 NA NA NA NA
## dsch_date_wday.lbl_4 NA NA NA NA
## dsch_date_wday.lbl_5 NA NA NA NA
## dsch_date_wday.lbl_6 NA NA NA NA
## dsch_date_wday.lbl_7 NA NA NA NA
## dsch_date_sin7_K1 0.013250 0.029561 0.448 0.65407
## dsch_date_cos7_K1 0.016051 0.029711 0.540 0.58911
## dsch_date_sin7_K2 -0.011116 0.029603 -0.376 0.70734
## dsch_date_cos7_K2 0.014063 0.029670 0.474 0.63558
## dsch_date_sin14_K1 0.008868 0.029917 0.296 0.76696
## dsch_date_cos14_K1 -0.022535 0.029699 -0.759 0.44811
## dsch_date_sin14_K2 NA NA NA NA
## dsch_date_cos14_K2 NA NA NA NA
## dsch_date_sin52_K1 0.067239 0.205373 0.327 0.74342
## dsch_date_cos52_K1 0.050818 0.194390 0.261 0.79381
## dsch_date_sin52_K2 -0.167646 0.099324 -1.688 0.09166 .
## dsch_date_cos52_K2 0.301338 0.097004 3.106 0.00193 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7948 on 1416 degrees of freedom
## Multiple R-squared: 0.3128, Adjusted R-squared: 0.2939
## F-statistic: 16.53 on 39 and 1416 DF, p-value: < 2.2e-16
Start by making a modeltime table:
modeltime_table() to store your fitted workflowsmodel_tblmodel_tbl <- modeltime_table(
workflow_fit_lm_1_spline,
workflow_fit_lm_2_lag
)
model_tbl## # Modeltime Table
## # A tibble: 2 x 3
## .model_id .model .model_desc
## <int> <list> <chr>
## 1 1 <workflow> LM
## 2 2 <workflow> LM
As a precautionary measure, please refit the models using modeltime_refit(). This prevents models that can go bad over time because of software changes.
# Refitting makes sure your models work over time.
model_tbl <- model_tbl %>%
modeltime_refit(training(splits) %>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date
))Use testing data to calibrate the model:
model_tblmodeltime_calibrate() to calibrate the model using testing(splits) (out-of-sample data)calibration_tblcalibration_tbl <- model_tbl %>%
modeltime_calibrate(
training(splits) %>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date
)
)
calibration_tbl## # Modeltime Table
## # A tibble: 2 x 5
## .model_id .model .model_desc .type .calibration_data
## <int> <list> <chr> <chr> <list>
## 1 1 <workflow> LM Test <tibble [1,612 x 4]>
## 2 2 <workflow> LM <NA> <lgl [1]>
Use modeltime_accuracy() to calculate the accuracy metrics.
calibration_tbl %>% modeltime_accuracy()## # A tibble: 2 x 9
## .model_id .model_desc .type mae mape mase smape rmse rsq
## <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 LM Test 0.668 228. 0.714 127. 0.860 0.227
## 2 2 LM <NA> NA NA NA NA NA NA
modeltime_forecast():
new_data = testing(splits)actual_data = data_prepared_tblplot_modeltime_forecast()calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits) %>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date
),
actual_data = data_prepared_tbl
) %>%
plot_modeltime_forecast()Forecasting thoughts:
calibration_tblmodeltime_refit() refit the model on the data_prepared_tbl datasetrefit_tbl <- calibration_tbl %>%
modeltime_refit(data = data_prepared_tbl)refit_tblmodeltime_forecast() to forecast the new_data = forecast_tbl. Use data_prepared_tbl as the actual data.plot_modeltime_forecast()refit_tbl %>%
modeltime_forecast(new_data = forecast_tbl,
actual_data = data_prepared_tbl) %>%
plot_modeltime_forecast()Apply the inversion to the forecast plot:
refit_tbl %>%
modeltime_forecast(new_data = forecast_tbl,
actual_data = data_prepared_tbl) %>%
# Invert Transformation
mutate(across(.value:.conf_hi, .fns = ~ standardize_inv_vec(
x = .,
mean = c(mean_a, mean_b, mean_c, mean_d),
sd = c(sd_a, sd_b, sd_c, sd_d)
))) %>%
mutate(across(.value:.conf_hi, .fns = exp)) %>%
plot_modeltime_forecast()workflow_fit_glmnet_2_lag <- workflow_fit_lm_2_lag %>%
update_model(
spec = linear_reg(penalty = 0.1, mixture = 0.5) %>%
set_engine("glmnet")
) %>%
fit(training(splits)%>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date
))calibration_tbl <- modeltime_table(
workflow_fit_lm_1_spline,
workflow_fit_lm_2_lag,
workflow_fit_glmnet_2_lag
) %>%
update_model_description(.model_id = 1, "LM - Spline Recipe") %>%
update_model_description(2, "LM - Lag Recipe") %>%
update_model_description(3, "GLMNET - Lag Recipe") %>%
modeltime_calibrate(testing(splits)%>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date
))calibration_tbl %>% modeltime_accuracy()## # A tibble: 3 x 9
## .model_id .model_desc .type mae mape mase smape rmse rsq
## <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 LM - Spline Recipe Test 1.04 114. 0.719 146. 1.37 0.0336
## 2 2 LM - Lag Recipe Test 0.790 125. 0.547 105. 1.17 0.285
## 3 3 GLMNET - Lag Recipe Test 0.848 102. 0.588 122. 1.25 0.200
calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits)%>%
arrange(
lag_ip_op_flag
, payer_grouping
, dsch_date
),
actual_data = data_prepared_tbl
) %>%
plot_modeltime_forecast()refit_tbl <- calibration_tbl %>%
modeltime_refit(data = data_prepared_tbl)refit_tbl %>%
modeltime_forecast(
new_data = forecast_tbl,
actual_data = data_prepared_tbl
) %>%
plot_modeltime_forecast()